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Abstract 

In ensemble (or bulk) quantum computation, 
measurements of qubits in an individual com- 
puter cannot be performed. Instead, only ex- 
pectation values can be measured. As a re- 
sult of this limitation on the model of com- 
putation, various important algorithms cannot 
be processed directly on such computers, and 
must be modified. We provide modifications of 
various existing protocols, including algorithms 
for universal fault-tolerant computation, Shor's 
factorization algorithm (which can be extended 
to any algorithm computing an NP function), 
and some search algorithms to enable process- 
ing them on ensemble quantum computers. 

1 Introduction 

Quantum computing is a new type of computing 
which uses the properties of quantum mechan- 
ics to suggest fast algorithms to several impor- 
tant problems. For example, Shor's algorithm 
[21] for factoring large numbers is exponentially 
faster than any known classical algorithm. Sim- 



ilarly, by utilizing Grover's algorithm [11] it is 
possible to search a database of size N in time 
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0(y/N), instead of O(N) in the classical setting. 

NMR computing, first suggested by Cory, 
Fahmy and Havel fTOfl , and by Gershenfeld 
and Chuang ||, is currently the most promis- 
ing implementation of quantum computing, and 
several quantum algorithms involving only few 
qubits have been demonstrated in the labs fli"o| , 
|, § 0, 0. In such NMR systems, each 
molecule is used as a computer. Different qubits 
in the computer are represented by spins of 
different nuclei. Many identical molecules (in 
fact, a macroscopic number) are used in par- 
allel; hence, this model is called ensemble or 
bulk quantum computation model. In such bulk 
models, qubits in a single computer cannot be 
measured, and only expectation values of a par- 
ticular bit over all the computers can be read 
outfl 

The impossibility of performing measure- 
ments on the individual computers causes se- 
vere limitations on ensemble quantum computa- 
tion. It was generally assumed that rather sim- 
ple strategies of delaying (or avoiding) measure- 
ments can be used to bypass these limitations 
and to enable the implementation of all quan- 
tum algorithms. We, however, find that for a 
scalable measurement model such strategies to 
be insufficient for many algorithms (including 
Shor's factorization algorithm and fault-tolerant 
computation). 



heading the state of n qubits together, as done in 
many current experiments, is not scalable since it re- 
quires distinguishing among 2™ states. 
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We briefly address other problems related 
to NMR-computation, namely, the addressing 
problem and the pseudo-pure-state (PPS) scal- 
ing problem, in Appendix A. In the rest of 
this paper we restrict ourselves to issues related 
solely to the ensemble measurement prob- 
lem. While the results here are vital for bulk 
computation, the specific results obtained re- 
garding universal and fault-tolerant sets of gates 
might also be important for other implementa- 
tions of quantum computing devices where de- 
laying measurements is desired. 

2 The measurement in ensem- 
ble quantum computation 

The measurement process in quantum mechan- 
ics can be described simply as follows: To mea- 
sure the state of a qubit, say = a\0) + 
in the computation basis (|0);|1)), one mea- 
sures the Hermitian operator (the observable) 

a z = to get the outcome Ao = 1 

with probability \a\ 2 and Ai = — 1 with prob- 
ability |/3| 2 . In an NMR ensemble model, the 
corresponding qubit in every computer is mea- 
sured simultaneously, resulting in the expecta- 
tion value, i.e., the outcome of the measurement 
is a signal of strength proportional to \a\ 2 — \f3\ 2 . 

Clearly, when the outcome of a measurement 
is expected to be the same on each of the com- 
puters, the ensemble measurement is as good 
as the standard (single computer) measurement. 
Usually, this is not the case. Hence, if the mea- 
surement process could yield different results for 
different individual computers, one would ex- 
pect that the corresponding algorithm will need 
modifications in order to run on an ensemble 
computer. 

The measurement problem is easily demon- 
strated in two cases: 

Random number generator (RNG): Using a 
single qubit one can easily create an RNG. 



To create a binomial probability distribution 
with parameter p one prepares a state y/p\Q) + 
i/l — p\l), and measures in the computational 
basis to obtain the desired RNG. This, as far as 
we know, cannot be done on an ensemble quan- 
tum computer, where only the expectation value 
pAo + (l— p)Ai can be classically monitored. It is 
unclear yet, whether any algorithm which uses 
an RNG as a subroutine can still be operated, 
e.g., using a qubit in a state y/p\0) + \/l — 
to be a control bit of the entire process that 
follows the creation of a random number. 

Teleportation: Standard teleportation can 
easily be performed on a three qubit quantum 
computer, but strictly speaking, it cannot be 
performed on an ensemble quantum computer. 
This is because a direct Bell-state measurement 
of the ensemble quantum computer is compu- 
tationally useless: each computer will yield a 
random result (of the Bell measurement), and 
on average the outcome is (1/2) Ao + (l/2)Ai for 
each of the two measured qubits; hence, there is 
no way to decide how to rotate the third qubit in 
each individual computer. Yet, a fully-quantum 
teleportation of the type suggested in || can 
be, and has been |L7]], performed on ensem- 
ble quantum computer: in this fully-quantum 
teleportation, the measurement of an individual 
computer is never monitored, and a classically- 
controlled rotation of the third qubit is replaced 
by a quantum control operation, in which the 
control qubits dephase before being used. 

The current algorithms, which have several 
possible measurement outcomes, can be sorted 
into four groups based on the processing which 
follows the measurement, and the possibility of 
avoiding the measurement. When implemented 
on ensemble computers, each of the four group 
requires a different adaptation strategy for the 
algorithm: 

1. For a particular "desired" outcome of the 
entire algorithm, there is more than one 
"good" / "desired" outcome of an intermedi- 
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ate measurement. An additional algorith- 
mic step is then used to derive the desired 
final result, and this algorithmic step can 
be replaced by a controlled operation [e.g., 
error-recovery, in which the final result is 
the corrected qubit; Shor's factoring algo- 
rithm, in which the same candidate for the 
"order" is obtained from different interme- 
diate measurement outcomes]. 

2. The algorithm has more than one correct 
final outcome and no further processing is 
done [e.g., Grover's search algorithm with 
several solutions] 

3. The algorithm has more than one fi- 
nal result, and some of the results are 
bad/undesired solutions. The algorithm 
is repeated when bad solution is obtained 
[e.g., a wrong factor obtained in Shor's fac- 
toring algorithm]. 

4. The measurement step of the algorithm 
ought to be replaced with available control 
operations (as in the first case), but such 
controlled operation cannot be performed 
[e.g., fault-tolerant universal computation]. 

The first case was recognized before in the 
seminal work of Gershenfeld and Chuang ||. 
When the outcomes of a measurement on vari- 
ous computers are not the same, it might be the 
case that the different measurement outcomes 
can be worked on by a classical algorithm such 
that a unique final answer is obtained. For such 
algorithms, one can simply delay (or even avoid) 
the measurements and incorporate the algorith- 
mic step, which follows the measurement, into 
the quantum algorithm ( ciS £1 controlled opera- 
tion). This modified algorithm will now yield 
a unique answer on all the computers. It was 
generally assumed that such strategies of delay- 
ing measurements can be used to save all quan- 
tum algorithms. In fact, the strategy's success 
is restricted only to the cases where the mea- 
surements can be delayed, the final outcome 



is unique, and the final outcome is always ob- 
tainedQ. Indeed this strategy works for the er- 
ror recovery process. The other cases explained 
above require major modifications of the algo- 
rithms. 

In the case of algorithms yielding several final 
good results (case (2)), we suggest reordering 
techniques that provide unique solutions. Im- 
plementation of search algorithms in the case 
of multiple solutions on ensemble computers re- 
quires such modifications (derived in Section 
4.3). We note that for a measurement model 
where all the 2 n states of an n-qubit system can 
be distinguished, the multiple-solutions case is 
not a problem. However, as noted earlier, such 
a scheme is not practical for any algorithm in- 
volving even tens of qubits, and the exponential 
resolution requirement makes it no better than 
a classical computer. 

In the case of algorithms having good and 
bad outcomes (case (3)), we show in Section 4.1 
cases where we can solve the problem by replac- 
ing bad results by random data, which do not 
interfere with the reading of the good result. 
Previous work by Gershenfeld and Chuang Q 
noted that Shor's factorization algorithm can be 
implemented on ensemble quantum computers, 
by solving the problem as in case (1). However, 
in addition to problem (1), Shor's algorithm (on 
ensemble computer) suffers also from problem 
(3), and hence the modified algorithm suggested 
in is not sufficient. Hence, the algorithm re- 
quires a further modification (the randomizing- 
bad-results strategy) in order to work in the 
general case. Alternatively, one might be able 
to control-repeat the computation in case the 
classical verification showed that the algorithm 
yielded a bad output; unfortunately, such strat- 
egy is not easily implementable and cannot be 
easily justified; furthermore it leads to a much 
longer computation process, and hence to higher 

2 Although we use the term "delayed" measurements, 
in our modified algorithms sometimes the measurements 
are not needed at all (and not merely delayed). 
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sensitivity to errors. 

Case (4) is of pivotal importance to realistic 
quantum computation. The schemes proposed 
so far for quantum fault-tolerant computation 
usually use an incomplete set of gates, i.e., a set 
of gates that does not generate a dense subset 
of the group of unitary operations. In order to 
complete the set to a universal set, the schemes 
use interactions with ancilla qubits, which are 
then measured |2^, [L5], Each such mea- 

surement is followed by an application of a uni- 
tary operation, Uj, that depends on the out- 
come of the measurement (J). A direct scheme 
for removing such measurements (followed by 
the required unitary operations Uj ) , and replac- 
ing them by controlled operations, A(Uj), will 
not in general be realizable. This is because, 
A(Uj) might not be realizable by the incom- 
plete set of fault-tolerant gates. For example, if 
one attempts to remove measurements in Shor's 
scheme for fault-tolerant realization of Toffoli 



gate [22], then the corresponding controlled op- 
erations would itself require Toffoli gates! We 
believe that this issue was not explicitly ad- 
dressed in previous works, and we show for the 
first time how an analysis of error propagation 
and careful design of classical reversible circuits 
can allow one to delay measurements in a fault- 
tolerant manner. 

In a prior work, addressing case (4), 
Aharonov and Ben-Or [jl]] have observed that 
the measurements required for fault tolerant 
computation can be substituted by reversible 
classical circuits performing controlled opera- 
tions. In this paper we give an explicit descrip- 
tion of this process and study in detail the pro- 
cess of error propagation and how it can be han- 
dled in the resulting circuit. Knill, Laflamme, 



and Zurek [14] followed a different approach 
that potentially does not require measurements. 
However, to the best of our knowledge, this ap- 
proach is incomplete and a proof of universal 
fault-tolerant computation is not yet available. 
For example, a measurement-free implementa- 



tion of the Hadamard gate using that approach 
has not been demonstrated. 

Finally, Peres [Q also discusses the possibil- 
ity of measurement-free encoding and decoding 
procedures in quantum error-correction. How- 
ever, in his scheme the quantum information is 
transformed to a single qubit, while we suggest 
a method that is suitable for fault-tolerant com- 
putation. 

3 Obtaining a universal and 
fault— tolerant set of gates 

The idea of quantum fault-tolerant computa- 
tion ||, g, n, n can be described briefly as 
follows. Suppose that we have a noisy quantum 
circuit C which we want to simulate by a fault- 
tolerant circuit C. In one level of such a circuit, 
the regular bits are replaced by logical bits |0) L 
and |1) L , where these are some entangled states 
of a block of physical qubits. While C operates 
on data qubits, in the circuit C all operations 
are performed on encoded data, i.e., each data 
qubit or a set of data qubits is represented as a 
block of qubits that belongs to some quantum 
error-correcting code. Then each operation of 
C performed by a gate gj is simulated by a pro- 
cedure (sub circuit) g~j in the circuit C such that 
in g~j each computation transforms codewords 
to codewords. In order to avoid accumulation 
of errors, after each computation in g~j a "cor- 
rection procedure" is performed to correct any 
error that is introduced in that computation. So 
in the fault-tolerant circuit C each computation 
step is followed by a correction step. 

The operations on the encoded qubits intro- 
duce a large number of additional gates and 
qubits, and unless one is careful, it is possi- 
ble that more errors are introduced than can 
be corrected by the code. To avoid any such 
catastrophic accumulation of errors, it is desir- 
able that the operations in the fault-tolerant cir- 
cuits prevent "spreading of errors" by making 
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sure that each gate error causes a single error in 
each block. It is useful now to review how errors 
propagate in quantum circuits. For example, 
consider the CNOT (controlled-not) gate which 



performs the operation \a) \b) t 



\a © b) t 



in the computation basis; for the rest of this 
paper, we shall drop the subscripts c (control) 
and t (target) and designate the control bit 
as the one on the left side. Clearly, applying 
CNOT operation from one bit to many target 
bits can propagate one bit error from the con- 
trol bit to all the target bits. On the other 
hand, applying CNOT from many control bits 
to one target bit can propagate one phase error 
from the target bit to all the control bits. It is 
easy to observe this "back" propagation of the 
phase errors: if we apply CNOT on the state 
(|0) + |1)) (g> (|0) + |1)) and there is a phase error 
in the target qubit, we will get 



|0>®(|0>-|1» + |1)®(|1)-|0» 
(|0>-|1»®(|0>- 



which results in a phase error in the control 
qubit. Hence, fault-tolerant computation re- 
quires that this gate be applied only in the case 
where the control qubit \a) and the target qubit 
| b) belong to different blocks. Furthermore, this 
error-propagation phenomenon is also true for 
other controlled operations, and this motivated 
a sufficient condition for fault tolerance: only 
perform bitwise operations or transversal opera- 
tions on qubits within a code. It is, however, not 
a necessary condition for fault-tolerance, and 
careful constructions may allow one to apply 
control gates from many control bits onto one 
target bit, without destroying the fault-tolerant 
computation, to resolve the catch-22 problem 
we observe in the following discussions. 

To get a quantum fault-tolerant computa- 
tion, it is enough to show that for a universal 
set of quantum gates the above mentioned pro- 
cedure on the encoded data is possible. Quan- 
tum fault-tolerant schemes usually (see, e.g., 



fH H) de P end 

on measurements to ensure 
that the set of the operations permissible on 
encoded data (i.e., codewords in a quantum 
error-correcting code) is actually a universal 
set. Some of the gates in the universal set do 
not require measurements, e.g., the operations 
H, a] /2 , and CNOT. [For CSS codes §|, each 
of these logical gates can simply be achieved by 
performing the same gate bit-wise on the indi- 
vidual qubits (e.g., H is achieved on code words 
via applying H on individual qubits), but the 

bit-wise o~ l J 2 yield a o~ z logical gate, hence re- 
quires an additional step of bit- wise a z , to yield 
the desired logical gate.] In existing suggestions 
(except as previously explained), at least 



one gate (e.g., Toffoli in 
requires measurements. 



2f and a 



1/4 



in 



]) 



There is always a simple scheme that poten- 
tially allows one to postpone measurements of 
ancilla qubits in quantum computation. Recall 
that a measurement is followed by an opera- 
tion Uj, which is a unitary operation performed 
on the data based on the outcome of a mea- 
surement on the ancilla qubits (and Uj can be 
performed fault-tolerantly using the given, non- 
universal, set of operations). As explained in 
Section 2, the scheme for delaying the measure- 
ment can be successfully implemented only if 
the controlled operations A(C/,-)'s are in the set 
of available measurement-free operations; i.e., 
these control operations can be implemented on 
encoded data fault-tolerantly and directly with- 
out using any measurements. However, in the 
cases investigated so far, it is not the case that 
the required controlled operations A(Uj) are im- 
plementable in a direct fault-tolerant manner. 
For instance, in Shor's fault-tolerant set of gates 
]22| , a measurement is required for the prepara- 
tion of a Toffoli gate, but a Toffoli gate is re- 
quired if we want to delay that measurement. 
This is because the measurement is followed 
by a controlled-NOT operation, and hence can 
only be replaced by a controlled-controlled- 
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NOT which is a Toffoli gate. This seems like 
a catch-22 situation! 

However, the solution comes from the vital 
observation that some operations need protec- 
tion only from the bit errors, and do not need 
to use full quantum codes. 

By replacing the "quantum ancilla" (in a log- 
ical basis |0}l and |1)l) by a "classical an- 
cilla" in a "classical" basis |0) = 1 - - - 0} and 
|1 ) = |1 • • • 1), we can use the classical ancilla to 
perform A(Uj) in a fault-tolerant manner, and 
this can be done in the two cases where the out- 
comes are the Toffoli gate required for the Shor's 
basis, and the a\!^ gate required for the basis 
of @. One can interpret the classical basis as 
the classical repetition code. We call the ancilla 
in these states "classical" since a classical error- 
correction code can be used to correct bit errors 
in it. Clearly, phase errors are not corrected in 
the classical ancilla, yet we found that the use 
of such a classical ancilla is still good enough for 
our purpose. 

Replacing Measurements of Encoded 
Ancilla Qubits: 

In the following we shall replace the measure- 
ment of the quantum ancilla followed by the 
operation U acting on the quantum data, by 
a sequence of operation: we copy the two ba- 
sis states of a quantum ancilla into a classical 
ancilla, we perform classical error correction on 
the classical ancilla, and we use the classical an- 
cilla control bit for performing the opera- 
tion A.(Uj) with the quantum data as the target 
bit. 

The measurement of the quantum ancilla in 
the original protocol is done as follows fl~9[| : 

''Similarly, in the fault-tolerant universal set of gates 
suggested in [0], the generation of the er^ 4 gate without 

1/2 

measurements leads to a catch-22 problem; a ov gate 
(which follows the measurement) need to be replaced by 



measure each of the physical qubits, and per- 
form a classical error correction on the outcome 
of this measurement to determine the state of 
the ancilla. For example, if the 7-bit CSS code 
|22]| is used to encode data, then a measurement 
will yield a possibly corrupted codeword of a 
classical 7-bit Hamming code. After classical 
error correction, if the parity of the codeword 
is "even" then the ancilla has collapsed to the 
state \0) L , otherwise to the state |Dl- Classical 
error correction is enough because phase errors 
before a measurement will not change the out- 
come probabilities. 

As a first step toward removing such a mea- 
surement, we propose a new gate that copies an 
encoded quantum ancilla word onto a classical 
ancilla: 



M : { 



|o) L ®|o; 

|0> L 
ID, 



ID 



|o> £ ®|q; 

10),® 







ID 



(i) 



1/2 

a A(oy ) gate, which is not available as long as the a 
gate is not available. 



1/4 



Let M be a unitary operation that implements 
the above transformation. [We show in the 
next subsection that this operation can be done 
fault-tolerantly] 

With this operation (A/"), the quantum bit is 
"copied" onto the classical ancilla. Since the 
repetition code can only correct for bit errors in 
the classical ancilla, one must make sure that 
the classical ancilla can still be used to per- 
form A(Uj) without putting the quantum data 
in jeopardy. This, however, is not a problem, 
since phase errors are transmitted from target 
bit to control bit, hence cannot be transmitted 
from the classical ancilla (control) to the quan- 
tum data (target). This leads to the most in- 
teresting and possibly counter intuitive aspect 
of our scheme: the data in the classical repeti- 
tion code, or any classical function of this data, 
can act as control bits in a bitwise controlled-?/ 
operation onto quantum data. 

We shall show later two cases where indeed 
the operations between the classical ancilla and 
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Figure 1: The operation J\f%. Note that the circuit shows the generation of only one classical 
target bit \b); the operations on the last bit have to be repeated to generate multiple target bits. 



the quantum data can be performed bit-wise 
while the same operations cannot be performed 
bit-wise between quantum ancilla and the quan- 
tum data (as the naive solution of delaying mea- 
surements would have suggested). 

Note that the quantum data may add phase 
errors to the repetition code, but that is of no 
concern to us, since also in the "measured" case, 
the classical repetition code has lost phase co- 
herence. If there are t bit errors in the repetition 
code, it will result in t errors in the quantum 
data. Fortunately, bit errors are corrected in the 
repetition code. Hence, the operation M enables 
one to create universal bases without measure- 
ment. 

The operation J\f'. quantum-to-classical 
controlled— NOT. In Figure [l], we represent 
a circuit that computes operation A/j for the 
seven-bit CSS code, where A/i stands for Eq.([j]) 
with only one bit of the classical ancilla. The 
syndrome ancilla bits are used to prevent the 
spread of one bit error from the quantum an- 
cilla into the classical bit. Only two errors (in 
any of the inputs, the gates or the time steps) 
shall yield an error in the classical bit. 



This is not the complete circuit; in the com- 
plete circuit, the same computation on the bot- 
tom four bits is repeated n times, where n is the 
number of qubits in a codeword. At each rep- 
etition stage the syndrome bits are discarded, 
and another bit bi is created (1 < i < n). In 
principle, the syndrome bits could be ignored, 
reset, or measured. These bits will not effect 
the operation beyond their use as a form of er- 
ror detection in the codeword. The bits bi are 
then corrected (to yield the classical or 1) us- 
ing a majority vote. 

The circuit Ai flips the bit b if the quantum 
ancilla (acting here as a control bit) is \1) L , and 
does nothing otherwise. This circuit operates 
properly as long as there is up to one bit error 
in the quantum data (there can actually be an 
unlimited number of phase errors). Note that 
phase errors in the lower part will spread to the 
quantum ancilla; however, this is of no conse- 
quence, since the quantum ancilla never interact 
with the quantum data in later stages. Bit er- 
rors in the quantum ancilla are important, since 
the process is repeated n times, hence bit er- 
rors, created in the quantum ancilla at initial 
stage of A/i , will spread errors into the next bits 
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Figure 2: Preparing an eigenvector. 
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Figure 3: Fault-tolerant cr 2 1//4 without measurement. 



of the classical ancilla. Fortunately, bit errors 
are not transmitted from the classical to quan- 
tum section, and the quantum ancilla cannot be 
disturbed by a bit error in bits of the classical 
ancilla or the syndrome ancilla. If there is one 
error in the |0) bits used to store the syndrome 
it will cause an error in the single classical bit. 
But such errors must be overcome by repeating 
this circuit n times with fresh syndrome bits 
for each repetition. At that point we will have 
a repetition code that will successfully recover 
from k' errors. Once this number k' is equal to, 
or greater than, the number of errors, k, that 
the quantum code can correct for, we may stop. 
For a probability p of an error (per gate, per 
input bit, and per delay line) the resulting error 
rate of this circuit is 0(p 2 ), as required for fault 
tolerant computation. The threshold can easily 
be calculated by counting the potential places 
for two errors, and the threshold can be much 
improved by enhancing the parallelism, and by 
repeating A/i only 2k + 1 times (e.g., with the 7- 
bit quantum code, that is n = 7, which corrects 
k = 1 error, it is enough to repeat the circuit 



3 times, correct the outcome using a majority 
vote, and then copy the result into seven bits). 

Any required classical reversible fault- 
tolerant calculation can be performed on the 
classical ancilla. Finally, it is used as control 
bits in bitwise operations back onto the quan- 
tum data. 

Creating the special states required 
for fault tolerant universal computation, 
without using a measurement 

Our method is general and can be described 
as follows. Assume that a quantum code of 
length n is used for encoding data. Suppose 
that U G U(2') [for our purpose it is enough 
to consider up to three qubits (I = 3) opera- 
tions], and U = U® n is the unitary operation 
on the codewords obtained by applying U bit- 
wise. Suppose that U has eigenvectors and 
\4>\) such that 



U\ 



and 



U 



Then the quantum circuit in Figure |2] out- 
puts the eigenvector |0q) if the input state is 
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a \(fto) + (3 \4>i) (for any a, (3). In this figure J7f|j p 
is a unitary operation that maps \4>q) on \4>i) 
and vice versa. The operations A(?7) (i.e., the 
controlled-?/), and H are applied bitwise. The 
last two controlled operations will be explained 
in the sequel. 

This scheme is practical if it is possible to 
prepare a state a\(po) + (3\(j>i), where it does 
not matter what is the values of a and (5. In 
this circuit the first line is a single parity bit, 
each of the second and third inputs is a block 
of n qubits, containing the cat-states lines and 
the special state lines respectively. The third 
gate, the controlled-not gate which we call here 
P, is a Parity gate which calculates the parity 
of the cat-state lines and puts the result in the 
parity bit. It is done by a sequence of controlled- 
not from each control bit onto one target bit. 
The figure only demonstrates the creation of one 
parity bit \cfto} in an unprotected manner (as far 
as a bit error in the parity bit is of concern). 
The real circuit is a bit different: The operations 
A(i7), H and P, are repeated n times, each time 
with fresh cat-states and fresh parity bit (but on 
the same special state's lines). Then a majority 
vote is calculated on the parity bits, in order 
to reduce the probability that an error in a cat 
state or in the parity bit will ruin the result. 
Then the in parity results are corrected, so that 
the probability of two errors becomes low [that 
is, of order 0(p 2 )]. Finally, the parity result is 
used to control Uf\\ p in a bit- wise manner, so that 
the special state is created via a fault tolerant 
operation. 

Fault tolerant o^ 1 / 4 without measure- 
ment. 

Let B be the basis consisting of H 
(Hadamard), cr^ 1 / 2 , and CNOT. The operations 
in B are fault-tolerant, simply because they can 
be applied to encoded data bitwise (when stan- 
dard codes are used). But B is not universal. 
One way to make B universal is to add the Tof- 



foli gate to it. Another way is to add the gate 
<r 2 1//4 , as shown in |Q. The advantages of this 
latter set of gates are that it is (a) simple to 
be implemented, (b) simple to be proven uni- 
versal, and (c) simple to operate with delayed 
measurements. 

We show here how it is possible to implement 
this operation on codewords without using any 
measurement. This scheme is a modified version 
of the original method for implementing o^ 1 / 4 
on codewords [§}], and it does not use measure- 
ments. 

First, we need to prepare the following state 




This state can be prepared with a circuit of form 
given in Figure ^. For this purpose, let U = 
e^o x o z a.}l 2 and |V>i) = ^ (|0} z - eT 

Then U\4, ) = |Vo), U\fh) = "IV>1>, and 

Uf\\ p = &z- 

Now we are ready to describe the fault- 
tolerant ct 2 1//4 without measurement. Then the 
circuit in Figure || shows the fault-tolerant im- 
plementation of <r z 1//4 on a codeword \x) L . In 
this circuit, M is the unitary operation defined 
in (|l]). Apart from replacing the standard mea- 
surements by the N circuit, this figure is exactly 
the same as the one drawn in to implement 
the <7 2 1//4 gate. In this figure each input in fact 
denotes a block of qubits, and operations are 
bitwise. 

Fault tolerant TofFoli without measure- 
ment. 

The more conventional (and more compli- 
cated) set of universal fault-tolerant gates con- 
tain the Toffoli instead of the a z 1 ^ . 

We show explicitly how to implement Tof- 
foli on encoded data without using any mea- 
surement. This scheme is a modified version of 
Shor's original method for implementing Toffoli 
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Figure 4: Fault-tolerant Toffoli without measurement. 



on codewords [^] . The method is similar to the 
one applied to <r 2 1//4 . 

In Shor's method (as in the other bases we 
have shown before) a preparation of a special 
state is required, hence we first prepare the state 

[AND) = | (|000> L + |010> L + |100> L + |111) J , (2) 

without using measurement, based on our "cre- 
ating a special state" technique. 

To get |AND) we let U = A(a z ) a z , and we 
chose 

| AND) = i (|001) L + |011> £ + |101) L + |110) L ). 

Then U |AND) = |AND), [7|AND> = 
- | AND), C/fiTp = I ® 1 8) cr x , and 

^ (|AND) + | AND)) = {H®H®H) |000) x . 

A different solution to this step was given (inde- 
pendently) by D. Aharonov and M. Ben-Or [Q]. 

Now we are ready to describe the fault- 
tolerant Toffoli without measurement. This pro- 
cedure is presented in Figure ^. In this circuit Af 



is the unitary operation defined in (|T]) ; apart for 
replacing the standard measurements by our M 
circuit, this figure is exactly the same as the one 
drawn by Preskill |n| to describe Shor's way of 
obtaining the Toffoli gate. 

Note that in this figure each input represents 
a block of qubits and operations on these blocks 
are defined in the natural way. Also note that 
the first three top outputs of this circuit are in 
a tensor product with the rest of the outputs. 

4 Quantum algorithms 

Here we study different known quantum algo- 
rithms that cannot be implemented directly on 
ensemble quantum computers and we provide 
modifications to make them suitable for such 
computers. 

4.1 The factorization algorithm 

In the Shor's factorization algorithm the aim is 
to factor a large number n. To do so, one uses 
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a random number x and tries to find the least 
positive integer r such that x r = 1 (mod n). 
This least r is the order of x mod n, and n can 
be factored with a high probability, once r is 
known. 

Shor's algorithm does not yield r directly (in 
the quantum process). Instead, another integer 
c is the actual outcome of the quantum proto- 
col, from which the right r can sometimes be 
obtained by a classical algorithm. Let us call 
the outcome of the classical algorithm r'\ in at 
least 0(1/ log log n) fraction of the cases, the 
number r' is the desired r, and whether it is the 
case or not is checked via a classical algorithm. 
Let the probability of a correct result (on an 
individual computer) be p r . While the order r 
(for a given x and n) is unique, the result c and 
the calculated r' are not unique. Having several 
good outcomes q does not cause a problem (as 
noted by in ||), since the quantum computer 
can perform a classical algorithm which calcu- 
lates r from any of the possible c, L . However, this 
operation by itself is not sufficient, since many 
of the computers (probably, most of them) give 
an outcome r' which is not the correct r. When 
expectation values are measured for the jth bit, 
the correct result r,- happens with small proba- 
bility p r , and hence it is obscured by the wrong 
results r'j. 

If the measurement process could distinguish 
among 2 n states of an n-qubit system (which 
will require exponential resolution), then one 
could read the correct result accurately. How- 
ever, such an operation is not permitted, and 
hence the technique of @ is not sufficient. An- 
other potential situation, which could also lead 
to a simple resolution, is if the wrong-r results 
are well distributed (e.g., totally random); in 
such a case, on the average these wrong-r re- 
sults will cancel out (e.g., average to yield zero) 
and will not obscure the correct result. Let us 
show that this is not always the case, and that 
the bad results are not always averaged to zero, 
and hence the good result sometimes is indeed 



obscured. 

The output c of the quantum process in Shor's 
algorithm is used to calculate the order r [21 



For this, the integers d! and r' are found such 
that 



d! 



l 



where n 2 < q < 2n 2 , and q is a power of 2. Then 
the fraction d'/r' is unique. The integer r' is 
the output of the algorithm as the desired order 
(which is actually r). To continue, let a(c) be 
the unique integer such that — q/2 < a(c) < q/2 
and rc = a(c) (mod q). One of the possible 
situations that leads to incorrect answer is that 
the output c of the quantum process satisfies the 
condition 



< 



1 

2? 



and d and r are not relatively prime. Then 
the answer, instead of r, would be a divisor of 
r. The probability that such event occurs is 
(see [^]) approximately 4(r — ^(r))/(7T 2 r). This 
probability can be some constant far away from 
zero. For example, if r = 2 s 3 t , then <p(r) = r/3 
and the probability the algorithm provides a di- 
visor of r is ~ 0.135. 

Let us now present a modified factorization 
protocol that bypasses this ensemble measure- 
ment problem. The idea is to replace an addi- 
tional part of the classical protocol, a part which 
verifies that r is indeed the order, by a quantum 
one. Also, a simple (but crucial) modification of 
the protocol is required. Let the register hold- 
ing the result (r or r') be called s\. Let us use 
an additional register S2 of the same number £ 
of qubits as s\. Let the register S2 be in the 
state 



H\0) <S>H\ 



7F/2 J2xe{0,l} e I 1 )' (3) 



where H\0) = ^tj(|0) + |1)). Now we augment 
the quantum factorization algorithm with the 
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following procedure. When the original factor- 
ization algorithm finishes, test the result in the 
register si to see whether it gives the correct 
value of the order r. If the result on the ith 
computer is indeed the order then nothing is to 
be done and the outcome r is kept in s\. When- 
ever the result is an incorrect value r', swap the 
contents of the registers s± and S2 so the out- 
come r' is replaced by the state H |0)®- • • (8>il|0) 
which yields a completely randomized outcome 
once it is measured. Now, a measurement of the 
jth bit on s\ will give the correct result if the 
string holds the state r or it yields zero (on aver- 
age) if the string originally contained the wrong 
result r' . 

Although the strength of the good signal may 
be small, there are enough computers running 
in parallel to read it since in the worst case, it 
is only logarithmically small. 

4.2 Algorithms for NP functions 

Technique used in the previous section for 
Shor's algorithm can easily be generalized to 
any quantum algorithm that computes an NP 
function. By an NP function we mean a func- 
tion whose graph is in the class P. More specifi- 
cally, a function / : X* — ► £*, for some alpha- 
bet S, such that there is a polynomial-time Tur- 
ing machine that for x,y £ S* decides whether 
f(x) = y or not. 

4.3 The search algorithm 

Certain search operations in a database can 
be done more efficiently on a quantum com- 



puter than on a classical computer [11]. Here 
the search means to find some item x in the 
database such that x satisfies some predefined 
condition T; i.e, we are looking for the solu- 
tions of T(x) = 1. The analysis of @ shows 
that if the size of the database is iV and the 
number of solutions are t, Grover's algorithm, 
with high probability, can find a solution in time 



{OyJ N/t). When there is only one solution, this 
algorithm yields the desired result also on an en- 
semble computer. 

However, when several (say t > 2) different 
items satisfy the required condition, the proto- 
col will randomly yield one of them. There- 
fore, in this case the algorithm is not suitable 
for ensemble computation. We show here how 
this algorithm can be modified such that ensem- 
ble computation still provides a correct solution 
with high probability. 

We assume t, the number of solutions, is 
known and constant (the general case will be 
studied in the next section). We first consider 
the case t = 2. When processed on an ensemble- 
measurement computer, only expectation values 
are obtained, and the two outcomes partially 
obscure each other to yield zero (as the average 
expected value) for jth. bit of the answer if the 
jth. bits of the two solutions are different. 

To solve this problem we suggest to hold sev- 
eral (say m) computers in one molecule. After 
each computer in the molecule finishes Grover's 
algorithm, the procedure is continued by sort- 
ing the outputs of different computers in an in- 
creasing order. Finally let the algorithm contain 
a step where the first and the last results are 
compared, and if they are equal then both are 
replaced by a randomized data (|3]), as in the 
modified Shor's algorithm. Once the first and 
last computers hold different outcomes, we are 
promised that the small solution is always the 
first, and that the large solution is always the 
last. Thus, we can obtain both solutions. 

The probability that the first and the last 
solutions are the same is gwr, so the final out- 
come is obtained with probability exponentially 
close to one. Even without applying the ran- 
domization to the bad outcomes, the expected 
outcomes are still readable. 

When t > 2, we apply the same procedure 
(without randomization to the bad outcomes). 
We still reorder the solutions so that the min- 
imal solution is in the first position. However, 



12 



we might obtain different minimal solutions for 
different molecules. The probability of failing to 
obtain the global minimum solution in the first 
position is (1 — j) m , and as long as it is small 
(say less than e _A , which holds if m > Xt) the 
protocol can work properly. Note that this mod- 
ified algorithm still works in time 0(yiV/t). 

Only the smallest and largest solutions can 
be obtained by the above method. If one needs 
the other solutions, these can easily be obtained 
via similar methods, once some solutions are al- 
ready known. 

4.4 Search algorithm: the case of un- 
known number of solutions 

Now we consider the most general case. Here we 
do not assume any condition on t, the number 
of solutions; it can be known or unknown, large 
or even zero. Our method is based on a binary 
search. We also utilize the following fact estab- 
lished in §: Let B be a database of size M; 
then the search algorithm, with high probabil- 
ity, starting with the input -j= X^es \ x ) in time 

0{yJ~M) can determine whether there is any so- 
lutions in B or not. 

Without loss of generality, we can assume 
that the database is represented as the mem- 
bers of the unit cube V = {0, l} n . So N = T. 
For any string a = (ai, . . . £ {0, l} fc , let 

V a be the subset of V consisting of all strings 
(ax,... ,a k ,x k+ i,... ,x n ); i.e., V a contains all 
strings in V that start with a. Thus \V a \ = 

^n—k 

Our algorithm first checks whether there is a 
solution or not. If there is no solution then it 
stops. Otherwise it runs in n stages. The out- 
put of the stage j is a database Bj of size 2 n ~ J 
which contains a solution. At the end B n = {£}, 
where £ is a solution. The algorithm starts with 
the database Bq = V. It checks whether there 
is any solution in Vq . If there is a solution then 
£>i = Vq, otherwise B\ = V\. In a general 
stage j + 1, the input is of the form Bj = V a . 



where ay G {0, 1} J , and there is a solution in 
Bj. Then the algorithm checks whether there is 
a solution in V aj o, if so then the output of this 
stage is Hj+i = V aj o, otherwise the output is 
Bj+i = ^Qijl- This completes the description of 
our search algorithm. It is easy to check that 
this algorithm always provides the first solution 
in the lexicographic order. So we have presented 
a quantum search algorithm that always gives a 
unique output, no matter how many solutions 
are there. This is an algorithm which can be im- 
plemented on an ensemble-measurement com- 
puter. Note that the running time of this algo- 
rithm is 

0(v / 2 ;r +v / 2™- 1 + --- + v / 2) =o(\/iv) . 

5 Error— recovery in the error- 
correction process 

Standard error correction can be viewed as a 
computation with more than one good answer, 
and thus belongs to Case (1) discussed in Sec- 
tion 2. In this case, the syndrome of the error is 
not unique. In the standard prescription, mea- 
surement is used to collapse the ancilla qubits 
containing the error information. Then these 
syndrome bits are processed by a classical re- 
versible algorithm to determine the errors, and a 
unitary operation to correct the error is applied 
to the data qubits by the output bits of the clas- 
sical algorithm. In the measurement-free case, 
the ancilla qubits need not be measured, and 
the classical subroutine (following the measure- 
ment) could be incorporated into the original 
quantum algorithm. 

One can easily verify that the above- 
mentioned classical subroutine (that processes 
the ancilla qubits) needs to use Toffoli gates. 
Using the techniques of Section 3 one could im- 
plement a quantum Toffoli gate without mea- 
surements, and hence, there is no fundamental 
problem in having a single quantum code for the 
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measurement-free circuit. However, implement- 
ing a quantum Toffoli gate fault tolerantly and 



Constant Error Rate", the journal ver- 
sion of 0, Los- Alamos archive: quant - 



without measurement is an involved process. 



ph/990612E . 



Fortunately, the techniques of Section 3 can be 
also applied so that the classical subroutine is 
carried out on a classical code. The state of the 
ancilla qubits can be first copied onto a classical 
repetition code using the M gate. Now classical 
reversible computation can be performed on the 
repetition code and then a control operation can 
be performed on the quantum data to correct for 
the errors. Since phase errors from the classical 
subcircuit will not propagate to the quantum 
data, using repetition codes to correct for any 
bit errors in the subcircuit is sufficient. This 
technique thus allows one to fault-tolerantly re- 
place quantum Toffoli gates by classical ones in 
the error recovery process. 

6 concluding Remarks 

To summarize, we showed that running algo- 
rithms on bulk (ensemble) computers is not al- 
ways obvious. We modified various important 
algorithms so that they can run on ensemble 
computers. 

More work is required in order to run algo- 
rithms without measurement with only near- 
neighbor interactions, and more work is required 
to solve the addressing and scaling problems. 

We are thankful to Dorit Aharonov for many 
helpful remarks. 

References 

[1] D. Aharonov and M. Ben-Or, "Fault- 
Tolerant Quantum Computation with Con- 
stant Error," Proc. of the 29th Annual 
A CM Symposium on Theory of Computing 
(STOC), pp. 46-55, 1997. 

[2] D. Aharonov and M. Ben-Or, "Fault- 
Tolerant Quantum Computation With 



[3] M. Boyer, G. Brassard, P. Hoyer and A. 
Tapp, "Tight bounds on quantum search- 
ing," Fortschritte der Physik, 46(1998), pp. 
493-505. 

[4] P. O. Boykin, T. Mor, M. Pulver, V. 
Roychowdhury, and F. Vatan, "On uni- 
versal and fault-tolerant quantum com- 
putation,", Los- Alamos archive |quant- 



ph/99060541 . To appear in Proc. 40th IEEE 
Ann. Symposium on Foundations of Com- 
puter Science (FOCS), 1999. 

[5] G. Brassard, S. Braunstein, and R. Cleve, 
"Teleportation clS ct quantum computa- 
tion", Physica D, 120(1998), pp. 43-47. 

[6] D. Cory, M. Price, W. Mass, E. Knill, 
R. Laflamme, W. Zurek, T. Havel, and 
S. Somaroo, "Experimental quantum er- 
ror correction", Physical Review Letters, 
81(1998), pp. 2152-2155. 

[7] D. DiVincenzo, "Real and realistic quan- 
tum computation", Nature, 393(1998), pp. 
113-114. 

[8] D. DiVincenzo and P. Shor, "Fault-tolerant 
error correction with efficient quantum 
codes," Physical Review Letters, 77(1996), 
pp. 3260-3263. 

[9] N. A. Gershenfeld and I. L. Chuang, 
"Bulk spin-resonance quantum computa- 
tion," Science, 275(1997), pp. 350-356. 

[10] D. G. Cory, A. F. Fahmy, and T. F. Havel, 
"Ensemble quantum computing by nuclear 
magnetic resonance spectroscopy," in Proc. 
Natl. Acad. Sci. 94(1997), pp. 1634-1639. 



14 



[11] L. Grover, "A fast quantum mechanical al- 
gorithm for database search," in Proceed- 
ings of 28th ACM Symposium on Theory 
of Computing, pp. 212-219, 1996. 

[12] J. A. Jones, M. Mosca, and R. H. Hansen, 
"Implementation of a quantum search al- 
gorithm on a quantum computer" , Nature, 
393(1998), pp. 344-346. 

[13] A. Kitaev, "Quantum Computations: Al- 
gorithms and Error Correction", Russian 
Math. Surveys 52(1997), pp. 1191-1249. 

[14] E. Knill, R. Laflamme, and W. H. Zurek, 
"Accuracy Threshold for Quantum Com- 
putation", Los Alamos archive: |quant 



ph/9610011 



[15] E. Knill, R. Laflamme, and W. H. Zurek, 
"Resilient quantum computation: error 
models and thresholds," Proceedings of 
the Royal Society of London, Series A, 
454(1998), pp. 365-384. 

[16] S. Lloyd, "Universal quantum simulators," 
Science, 273(1996), pp. 1073-1078. 

[17] M. A. Nielsen, E. Knill, and R. Laflamme, 
"Complete quantum teleportation using 
nuclear magnetic resonance", Nature, 
396(1998), pp. 52-55. 

[18] A. Peres, "Quantum disentanglement 
and computation," Superlattices and Mi- 
crostructures, 23(1998), pp. 373-379. 

[19] J. Preskill, "Reliable quantum computers," 
Proc. of the Royal Society of London, Ser. 
A, 454(1998), pp. 385-410. 

[20] L. J. Schulman and U. Vazirani, "Scal- 
able NMR quantum computing," LANL e- 



print, |quant-ph/9804060| , 1998. 



[21] P. Shor, "Polynomial-time algorithms for 
prime factorization and discrete logarithms 



on a quantum computer," SIAM J. Com- 
puting, 26(1997), pp. 1484-1509. 

[22] P. Shor, "Fault-tolerant quantum compu- 
tation," in Proc. 31th IEEE Ann. Sympo- 
sium on Foundations of Computer Science, 
pp. 56-65, 1996. 

[23] W. S. Warren, "The usefulness of NMR 
quantum computing", Science, 277(1997), 
pp. 1688-1689. 

APPENDIX A 

As mentioned in the introduction, in NMR 
computing, each molecule is used as a computer, 
and different qubits in one computer are spins 
of different nuclei. Many identical molecules are 
used (a macroscopic number) in parallel. More- 
over, the state of the qubits is initially a thermal 
mixture. 

There are three main problems with the cur- 
rent proposals for NMR computers [ 23 , [?J : the 
ensemble-measurement problem, the address- 
ing problem, and the pseudo-pure-state scaling 
problem. Unless these problems can be solved 
or mitigated, it is widely believed that NMR 
computing will not be very useful as a future 
computing device. As we demonstrate in this 
paper, the ensemble-measurement problem can 
be addressed successfully. While the other two 
problems are challenging, we argue in the fol- 
lowing paragraphs that recent advances do hold 
the promise of mitigating their effects, and that 
further research is required before one can con- 
clude whether NMR/ensemble quantum com- 
puting can indeed be scaled up to perform prac- 
tical quantum computation. 

The addressing problem: The individual 
qubits in an NMR computing system cannot be 
accessed by a laser directed only to it, and hence 
different level separation is usually used for each 
of the qubits. For n qubits (with only near- 
neighbor interactions) there is a need for 0{n) 
different laser frequencies, and off-resonance ef- 
fects become non-negligible. A solution to this 
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problem was suggested in |l6|], where a chain 
of three different types of qubits, arranged in 
the form of ABC ABC ABC . . . ABC is used. 
In this chain one (and only one) of the qubits, 
say of type B, is replaced by a pointer-qubit of 
a fourth type D. Now, with only five swaps 
operations: swap(^4£>), swap(CA), swap(BC), 
swap(AD), swap(DC), and with three opera- 
tions on the pointer and its neighborhood: in- 
dividual qubit rotations R(D), and two-qubit 
operations U(AD) and U(CA), algorithms can 
run with only a polynomial slowdown. Thus, in 
principle, universal quantum computation can 
be performed on an NMR system with only a 
constant number of laser frequencies. 

The pseudo-pure-state scaling problem: The 
state of the qubits in an NMR computer is 
highly mixed. It is a thermal mixture so that 
the qubits are in a state that is |0) with proba- 
bility -4^ and in a state which is |1) with prob- 
ability -S^, where e is a function of the tem- 
perature and the applied strong magnetic field. 
For the quantum computation model, however, 
it is assumed that initially all its qubits are in 
a known state, which, without loss of gener- 
ality, is assumed to be the state |0). In the 
existing literature (and current experiments), 
a novel purification technique was used, which 
creates a "pseudo-pure-state", that is, a state 
which can be written as a mixture of the iden- 
tity and a pure state. Then, the algorith- 
mic steps are performed on the "pseudo-pure 
state". While this ingenious technique allows 
one to perform entanglement manipulation and 
demonstrate quantum algorithms involving a 
few qubits, it has an inherent limitation. In 
particular, there is an information loss in the 
process of mixing (via a non-unitary opera- 
tion) of all eigenstates except the state |000...0) 
(see, for example, § for detailed explanations), 
leading to an exponential decrease in signal-to- 
noise ratio with the increase in the number of 
qubits. Hence, the current pseudo-pure state 
approaches cannot be scaled up, and thus they 



lose any potential advantage over classical com- 
puters. It is worth observing, however, that 
the exponential loss of signal is an artefact of 
the existing pseudo-pure state approaches, and 
is not inherent to NMR or ensemble quantum 
computing. For example, a simple information- 
theoretic analysis suggests that k = 0(ne 2 ) pure 
qubits can be distilled from n thermal qubits, 
which are highly mixed. This idea was analyzed 
further in pO] , where an algorithm for extract- 
ing 0(ne 2 ) pure qubits from a thermal mixture 
of n qubits was suggested. While the solution 
of |2(J is good only when n is large compared to 
e 2 , it clearly proves the point that methods for 
creating much better pseudo-pure states prob- 
ably exist, and that the scaling problem should 
certainly not discourage scientists from pursu- 
ing ensemble quantum computation. We are 
currently working on this problem and the ini- 
tial results are very promising. 
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